13 research outputs found

    The Exploitation of Web Navigation Data: Ethical Issues and Alternative Scenarios

    Get PDF
    Nowadays, the users' browsing activity on the Internet is not completely private due to many entities that collect and use such data, either for legitimate or illegal goals. The implications are serious, from a person who exposes unconsciously his private information to an unknown third party entity, to a company that is unable to control its information to the outside world. As a result, users have lost control over their private data in the Internet. In this paper, we present the entities involved in users' data collection and usage. Then, we highlight what are the ethical issues that arise for users, companies, scientists and governments. Finally, we present some alternative scenarios and suggestions for the entities to address such ethical issues.Comment: 11 pages, 1 figur

    Using Passive Measurements to Demystify Online Trackers

    Get PDF
    The Internet revolution has led to the rise of trackers—online tracking services that shadow users’ browsing activity. Despite trackers’ pervasiveness, few users install privacy-enhancing plug-ins

    Method for detecting web tracking services

    Get PDF
    Method for detecting web tracking services during browsing activity performed by clients having associated client identifiers, the method comprising the steps of extracting key- value pairs contained into navigation data, looking for one-to-one correspondence between said client identifiers and the values contained in said keys and selecting the keys for which at least a client-value one-to-one correspondence for at least a predetermined number of clients is observed, said keys identifying the associated services as services performing tracking activities

    A method for exploring traffic passive traces and grouping similar urls

    Get PDF
    Computer security method for the analysis of passive traces of HTTP and HTTPS traffic on the Internet, with extraction and grouping of similar Web transactions automatically generated by malware, malicious services, unsolicited advertising or other, comprises at least the following processing and control steps: a) URLs extraction from an operational network, using passive exploration of the HTTP e HTTPS traffic data and subsequent collection into batches of the extracted URLs; b) detection of similar URLs, by metrics calculation based on the distance among URLs, namely based on a measure of the degree of diversity among pairs of character strings composing the URLs; c) activation of one or more clustering algorithms used to group the URLs based on the similarity metrics and to obtain, within each group of URLs, elements with similar/homogeneous features, adapted to be analyzed as a single entity; d) visualization of elements according to a sorting based on the degree of cohesion of the URLs contained in each grouping

    Big Data Methodologies and Applications to Privacy and Web Tracking in the Internet

    No full text
    While on the Internet, individuals encounter invisible services that collect personal information, also known as third-party Web trackers (trackers for short). Linked to advertisement, social sharing, and analytic services in general, hundreds of companies de facto track and build profiles of people. Therefore, actually individuals leak personal and corporate information to trackers whose (legitimate or not) businesses revolve around the value of collected data. The implications are serious, from a person unwillingly exposing private information to an unknown third-party, to a company being unable to control the flow of its information to the outside world. As a result, users have lost control over their private data in the Internet. The scope of this thesis is threefold: show firstly how Web trackers are popular and how users are involved in this phenomenon; propose secondly algorithms and methodologies to automatically pinpoint these services and, more in general, malicious traffic; introduce finally CROWDSURF, a platform for comprehensive and collaborative auditing of data that flows to Internet services. Many results show a worrying scenario. Web trackers are omnipresent. They are embedded in almost all websites (more than 70%), including the most popular ones, and some of these are able to track continuously 98% of the internauts. Users, that often do not know anything about the phenomenon, use countermeasures that suffer many problems and sometimes act not clearly and with no transparency. With the aim to provide new tools to overcome limitations of actual solutions, I propose two automatic methodologies. Both two algorithms show excellent results: using a very small dataset, the first methodology identifies 34 new third-party Web trackers not present in available blacklists; second algorithm clusters perfectly malicious traffic, e.g., malware, advertising services, and third-party tracking services. These methodologies could easily be used to realize a new generation of anti-tracking solutions, overcoming one of the biggest problem of this generation, that pinpoint trackers manually. Finally, CROWDSURF presents the features that an anti-tracking solution should have. This platform is very preliminary and presents practical challenges that must be faced, but could be the milestone for a new generation of solutions able to give back to users the control of information exchanged on the Internet

    Unsupervised Detection of Web Trackers

    No full text
    When browsing, users are consistently tracked by parties whose business builds on the value of collected data. The privacy implications are serious. Consumers and corporates do worry about the information they unknowingly expose to the outside world, and they claim for mechanisms to curb this leakage. Existing countermeasures to web tracking either base on hostname blacklists whose origin is impossible to know and must be continuously updated. This paper presents a novel, unsupervised methodology that leverages application-level traffic logs to automatically detect services running some tracking activity, thus enabling the generation of curated blacklists. The methodology builds on an algorithm that pinpoints pieces of information containing user identifiers exposed in URL queries in HTTP(S) transactions. We validate our algorithm over an artificial dataset obtained by visiting the top 200 most popular websites in the Alexa rank. Results are excellent. Our algorithm identifies 34 new third- party trackers not present in available blacklists. By analyzing the output of our algorithm, some privacy-related interactions emerge. For instance, we observe scenarios clearly hinting to Cookie Matching practice, for which information about users' activity gets shared across several different third-partie

    The Exploitation of Web Navigation Data: Ethical Issues and Alternative Scenarios

    Get PDF
    Nowadays, the users' browsing activity on the Internet is not completely private due to many entities that collect and use such data, either for legitimate or illegal goals. The implications are serious, from a person who exposes unconsciously his private information to an unknown third party entity, to a company that is unable to control its information to the outside world. As a result, users have lost control over their private data in the Internet. In this paper, we present the entities involved in users' data collection and usage. Then, we highlight what are the ethical issues that arise for users, companies, scientists and governments. Finally, we present some alternative scenarios and suggestions for the entities to address such ethical issues

    CLUE: Clustering for Mining Web URLs

    Get PDF
    The Internet has witnessed the proliferation of applications and services that rely on HTTP as application protocol. Users play games, read emails, watch videos, chat and access web pages using their PC, which in turn downloads tens or hundreds of URLs to fetch all the objects needed to display the requested content. As result, billions of URLs are observed in the network. When monitoring the traffic, thus, it is becoming more and more important to have methodologies and tools that allow one to dig into this data and extract useful information. In this paper, we present CLUE, Clustering for URL Exploration, a methodology that leverages clustering algorithms, i.e., unsupervised techniques developed in the data mining field to extract knowledge from passive observation of URLs carried by the network. This is a challenging problem given the unstructured format of URLs, which, being strings, call for specialized approaches. Inspired by text-mining algorithms, we introduce the concept of URL-distance and use it to compose clusters of URLs using the well-known DBSCAN algorithm. Experiments on actual datasets show encouraging results. Well-separated and consistent clusters emerge and allow us to identify, e.g., malicious traffic, advertising services, and third-party tracking systems. In a nutshell, our clustering algorithm offers the means to get insights on the data carried by the network, with applications in the security or privacy protection fields

    Evaluation of Some Cardiac Functions among Children with Vitamin D Deficiency Rickets

    No full text
    Vitamin D deficiency rickets (VDDR) is commonly recognised disease in Egypt. The most striking&nbsp; biochemical fi nding in this illness is hypocalcaemia which may affect ventricular contraction. This study was a prospective hospital based study aiming to evaluate some cardiac functions among children with VDDR by Echocardiography.Patients and Methods: Patients: The included 100 patients (Group 1) with VDDR in addition to control group (Group 2) which included 50 healthy children. Methods: all cases were subjected to a thorough history, full clinical examinations and investigations which include: serum calcium, phosphorus, alkaline phosphatase, 25(OH) vitamin D, parthhormone, chest x ray, electrocardiogram and echocardiography to measure [left ventricular functions systolic function which include( ejection fraction% (EF%), fraction shortening% (FS%) left ventricular end diastolic diameter(LVEDD) and left ventricular end systolic diameter (LVESD)] .Results: EF%&amp;FS% were signifi cantly lower while LVEDD &amp; LVESD were signifi cantly higher among studied VDDR cases (Group 1) when compared with control (Group 2). These echocardiographic parameters were improved with treatment with vitamin D and calcium. Conclusions: children with VDDR have a signifi cant impairment in left ventricle systolic functions which improved with appropriate treatment.</p
    corecore